Search CORE

18 research outputs found

CoPub update: CoPub 5.0 a text mining system to answer biological questions

Author: Alako
B. Heupers
Chen
Frijters
Ideker
J. de Vlieg
J. Polman
Pico
R. Frijters
R. van Schaik
S. Verhoeven
Sharan
W. Alkema
W. W. M. Fleuren
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

In this article, we present CoPub 5.0, a publicly available text mining system, which uses Medline abstracts to calculate robust statistics for keyword co-occurrences. CoPub was initially developed for the analysis of microarray data, but we broadened the scope by implementing new technology and new thesauri. In CoPub 5.0, we integrated existing CoPub technology with new features, and provided a new advanced interface, which can be used to answer a variety of biological questions. CoPub 5.0 allows searching for keywords of interest and its relations to curated thesauri and provides highlighting and sorting mechanisms, using its statistics, to retrieve the most important abstracts in which the terms co-occur. It also provides a way to search for indirect relations between genes, drugs, pathways and diseases, following an ABC principle, in which A and C have no direct connection but are connected via shared B intermediates. With CoPub 5.0, it is possible to create, annotate and analyze networks using the layout and highlight options of Cytoscape web, allowing for literature based systems biology. Finally, operations of the CoPub 5.0 Web service enable to implement the CoPub technology in bioinformatics workflows. CoPub 5.0 can be accessed through the CoPub portal http://www.copub.org

mspecLINE: bridging knowledge of human disease with the proteome

Author: AM Cohen
B Ye
BJ Stapley
BT Alako
C Bennett
CC van der Eijk
DJ Slotta
E Keogh
Eric W Deutsch
EW Deutsch
F Desiere
H Liao
H Liu
HJ Lowe
J Boyle
J Saltz
Jeremy Handcock
John Boyle
M Li
M Li
M Li
MY Brusniak
P Khatri
P Mallick
P Picotti
P Shannon
PA Covitz
R Cilibrasi
R Cilibrasi
R Homayouni
RL Cilibrasi
S Deerwester
V Lange
Y Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Public proteomics databases such as PeptideAtlas contain peptides and proteins identified in mass spectrometry experiments. However, these databases lack information about human disease for researchers studying disease-related proteins. We have developed mspecLINE, a tool that combines knowledge about human disease in MEDLINE with empirical data about the detectable human proteome in PeptideAtlas. mspecLINE associates diseases with proteins by calculating the semantic distance between annotated terms from a controlled biomedical vocabulary. We used an established semantic distance measure that is based on the co-occurrence of disease and protein terms in the MEDLINE bibliographic database. Results The mspecLINE web application allows researchers to explore relationships between human diseases and parts of the proteome that are detectable using a mass spectrometer. Given a disease, the tool will display proteins and peptides from PeptideAtlas that may be associated with the disease. It will also display relevant literature from MEDLINE. Furthermore, mspecLINE allows researchers to select proteotypic peptides for specific protein targets in a mass spectrometry assay. Conclusions Although mspecLINE applies an information retrieval technique to the MEDLINE database, it is distinct from previous MEDLINE query tools in that it combines the knowledge expressed in scientific literature with empirical proteomics data. The tool provides valuable information about candidate protein targets to researchers studying human disease and is freely available on a public web server.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Improving protein function prediction methods with integrated literature data

Author: A Karimpour-Fard
A Vazquez
A Vinayagam
Aaron P Gabow
AK Ramani
B Schwikowski
BTF Alako
C Brun
C von Mering
Debra S Goldberg
E Nabieva
HW Mewes
I Xenarios
J Rual
K Tsuda
L Hunter
L Hunter
L Tanabe
Lawrence E Hunter
M Ashburner
M Aubry
M Chagoyen
M Huynen
M Krallinger
M Krallinger
M Pelligri
M Yetisgen-Yildiz
OG Troyanskaya
P Srinivasan
PM Bowers
R Cilibrasi
R Hoffmann
S Letovsky
S Raychaudhuri
Sonia M Leach
T Schlitt
T Tanabe
TK Jenssen
U Karaoz
William A Baumgartner
Y Ofran
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity. Results We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder, but co-occurrence data still proves beneficial. Conclusion Co-occurrence data is a valuable supplemental source for graph-theoretic function prediction algorithms. A rapidly growing literature corpus ensures that co-occurrence data is a readily-available resource for nearly every studied organism, particularly those with small protein interaction databases. Though arguably biased toward known genes, co-occurrence data provides critical additional links to well-studied regions in the interaction network that graph-theoretic function prediction algorithms can exploit.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The COMPARE Data Hubs

Author: Aarestrup F.M. (Frank)
Alako B.T.F. (Blaise)
Amid C. (Clara)
Belka A. (Ariane)
Caccio S. (Simone)
Cisneros J.L.B. (Jose L B)
Cochrane G. (Guy)
Cotten M. (Matthew)
Csabai I. (István)
Dos S Ribeiro C. (Carolina)
Dynovski L.D. (Lukasz D.)
Haringhuizen G.B. (George B.)
Harrison P.W. (Peter W.)
Holt S. (Sam)
Hundahl C. (Camilla)
Hussein A. (Abdulrahman)
Höper D. (Dirk)
Jayathilaka S. (Suran)
Kaas R.S. (Rolf S.)
Koopmans D.V.M. M.P.G. (Marion)
Kroneman A. (Annelies)
Leinonen R. (Rasko)
Liu X. (Xin)
Lund O. (Ole)
Malhotra-Kumar S. (Surbhi)
Nieuwenhuijse D.F. (David F.)
Pakseresht N. (Nima)
Pataki B.Á. (Bálint Á)
Rahman N. (Nadim)
Schmitz D. (Dennis)
Silvester N. (Nicole)
Skiby J.E. (Jeffrey E.)
Stéger J. (József)
Szalai-Gindl J.M. (János M)
Thomsen M.C.F. (Martin C F)
Visontai D. (Dávid)
Xavier B.B. (Basil Britto)
Publication venue: 'Oxford University Press (OUP)'
Publication date: 23/12/2019
Field of study

Data sharing enables research communities to exchange findings and build upon the knowledge that arises from their discoveries. Areas of public and animal health as well as food safety would benefit from rapid data sharing when it comes to emergencies. However, ethical, regulatory and institutional challenges, as well as lack of suitable platforms which provide an infrastructure for data sharing in structured formats, often lead to data not being shared or at most shared in form of supplementary materials in journal publications. Here, we describe an informatics platform that includes workflows for structured data storage, managing and pre-publication sharing of pathogen sequencing data and its analysis interpretations with relevant stakeholders

Erasmus University Digital Repository

Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases

Author: AA Morgan
AC Nicholson
AJ Perez
Andrey Rzhetsky
AP Weetman
B Dell'Osso
B Rapoport
B Vaidya
BA Imhof
BT Alako
C Blaschke
C Nielsen
C Puozzo
CJ McDougle
CR Faltynek
D Chaussabel
D Denys
D Hristovski
D Olive
D Shao
DB Kell
DR Swanson
DR Swanson
E Yung
EC Butcher
EC Butcher
GR Hajer
H Kakeya
H Shatkay
HP Fischer
I Kola
J Han
J Kuhlmann
JA Wagner
Jacob de Vlieg
JD Wren
JD Wren
K Kajinami
K Miguita
K Njung'e
K Tomiyama
K Vandenborre
L Prokunina
LJ Jensen
M Briley
M Briley
M Campillos
M Hayashi
M Imoto
M Inazu
M Kamata
M Sugiyama
M Yetisgen-Yildiz
MA Andrade
MA Andrade
Marianne van Vugt
N Daraselia
NR Smalheiser
PD Pelton
PR Newby
R Frijters
R Frijters
R Frijters
R Homayouni
R Jelier
RA DiGiacomo
Raoul Frijters
René van Schaik
Ruben Smeets
RY Mukhtar
S Gordon
S Morikawa
S Raychaudhuri
S Raychaudhuri
SN Vaishnavi
SS Fuller
T Fawcett
T Hiramatsu
T Ito
T Shokawa
T Tabata
TK Jenssen
TT Ashburn
U Kaneyuki
WA Colburn
WK Goodman
Wynand Alkema
Y Ichimaru
Y Sugimoto
Y Tamori
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Radboud Repository

A Major Role for the Plasmodium falciparum ApiAP2 Protein PfSIP2 in Chromosome End Biology

Author: A Scherf
A Scherf
A Taddei
AA Franco
AE Ehrenhofer-Murray
AJ Marty
AM Salcedo-Amaya
B van Steensel
BA Moser
Blaise T. F. Alako
BW Mok
C Flueck
C Lambros
C Lavazec
Christian Flueck
CJ Tonkin
CL Gatlin
D Moazed
DI Baruch
DI Baruch
DJ Clarke
E Pongponratn
EJ Louis
EK De Silva
FE Pryde
FE Pryde
G Hu
GG MacPherson
H Van Attikum
Hendrik G. Stunnenberg
Igor Niederwieser
J Kanoh
JA Rowe
JA Young
JC Reeder
JD Barry
JD Smith
JF Diffley
JF Diffley
JG Beeson
JJ Lopez-Rubio
JJ Lopez-Rubio
JP Cooper
JP Gardner
K Labib
K Perez-Toledo
Kami Kim
Kathrin Witmer
L Aravind
LH Freitas-Junior
LH Freitas-Junior
LM Figueiredo
LM Figueiredo
LM Iyer
M Gissot
M Llinas
M Niang
M Ohme-Takagi
M Sadaie
M Yuda
MJ Gardner
MN Conrad
MT Duraisingh
N Collins
NS Moon
O Cinquin
P Horrocks
P Venditti
Paul Jenoe
PB Singh
PM Burgers
R Mossi
RC Conaway
Richard Bartfai
RM Coulson
RW Snow
S Balaji
S Nole-Wilson
S Yu
SA Ralph
SH Reed
SI Grewal
SM Gasser
SS Baker
Suzette Moes
T Chookajorn
Till S. Voss
TS Voss
TS Voss
TS Voss
V Dror
VA Zakian
W Trager
WH Tham
XZ Su
Z Bozdech
Zbynek Bozdech
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The heterochromatic environment and physical clustering of chromosome ends at the nuclear periphery provide a functional and structural framework for antigenic variation and evolution of subtelomeric virulence gene families in the malaria parasite Plasmodium falciparum. While recent studies assigned important roles for reversible histone modifications, silent information regulator 2 and heterochromatin protein 1 (PfHP1) in epigenetic control of variegated expression, factors involved in the recruitment and organization of subtelomeric heterochromatin remain unknown. Here, we describe the purification and characterization of PfSIP2, a member of the ApiAP2 family of putative transcription factors, as the unknown nuclear factor interacting specifically with cis-acting SPE2 motif arrays in subtelomeric domains. Interestingly, SPE2 is not bound by the full-length protein but rather by a 60kDa N-terminal domain, PfSIP2-N, which is released during schizogony. Our experimental re-definition of the SPE2/PfSIP2-N interaction highlights the strict requirement of both adjacent AP2 domains and a conserved bipartite SPE2 consensus motif for high-affinity binding. Genome-wide in silico mapping identified 777 putative binding sites, 94% of which cluster in heterochromatic domains upstream of subtelomeric var genes and in telomere-associated repeat elements. Immunofluorescence and chromatin immunoprecipitation (ChIP) assays revealed co-localization of PfSIP2-N with PfHP1 at chromosome ends. Genome-wide ChIP demonstrated the exclusive binding of PfSIP2-N to subtelomeric SPE2 landmarks in vivo but not to single chromosome-internal sites. Consistent with this specialized distribution pattern, PfSIP2-N over-expression has no effect on global gene transcription. Hence, contrary to the previously proposed role for this factor in gene activation, our results provide strong evidence for the first time for the involvement of an ApiAP2 factor in heterochromatin formation and genome integrity. These findings are highly relevant for our understanding of chromosome end biology and variegated expression in P. falciparum and other eukaryotes, and for the future analysis of the role of ApiAP2-DNA interactions in parasite biology

Public Library of Science (PLOS)

Crossref

LSHTM Research Online

edoc

Directory of Open Access Journals

PubMed Central

Radboud Repository

Linking genes to literature: text mining, information extraction, and retrieval applications for biology

Author: A Divoli
A Doms
A Mitchell
A Sood
Alfonso Valencia
B Alako
B Carpenter
B Settles
BR Haynes
C Batchelor
C Blaschke
C Nedellec
C Rodriguez-Penagos
C Sneiderman
D Chen
D Chen
D Hanisch
D Koning
D Oliver
D Rebholz-Schuhmann
D Searls
D Wheeler
E Camon
F Couto
F Couto
G Divita
G Gomez-Lopez
G Grimes
G Poulter
H Che
H Liu
H Mangalam
H Shatkay
H Yu
I Iliopoulos
I Sarkar
J Baumgartner
J Caporaso
J Chang
J Chang
J Hakenberg
J Hakenberg
J Lewis
J Tamames
J Wilbur
J Wren
K Frantzi
K Mane
K Tomanek
L Chen
L Hunter
L Smith
L Smith
L Tanabe
Lynette Hirschman
M Ashburner
M Craven
M Errami
M Falagas
M Fattore
M Galperin
M Huang
M Krallinger
M Krallinger
M Krauthammer
M Muin
M Ongenaert
M Porter
M Shultz
M Shultz
M Synnestvedt
M Weeber
MA Andrade
Martin Krallinger
MJ Schuemie
N Okazaki
N Smalheiser
N Smalheiser
P Fontelo
P Leary
P Roberts
Q Tu
R Grishman
R Hoffmann
R Hoffmann
R Kittredge
R Netzel
R Steinbrook
S Altschul
S Brady
S Buckingham
S Douglas
S Nelson
S Staab
T Jenssen
T Shtatland
T Vanhecke
W Baumgartner
W Xuan
W Zhou
W Zhou
Y Fang
Y Yamamoto
Z Harris
Publication venue: BioMed Central
Publication date
Field of study

Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet

Crossref

PubMed Central

Biomedical Discovery Acceleration, with Applications to Craniofacial Development

Author: A Amano
A Baumeister
A Cvekl
A Ferrer-Martinez
A Gabow
A Gavalas
A Hollnagel
A Jaimovich
A Karimpour-Fard
A Karimpour-Fard
A Karimpour-Fard
A Karimpour-Fard
A L'Honore
A Nakaya
A Nazarali
A Subramanian
A Visel
A Yamane
A Zanzoni
AK Ramani
AM Edwards
AY Sivachenko
B Kanzler
BJ Daigle Jr
BT Alako
C Faloutsos
C North
C von Mering
CH Yeang
CL Myers
CL Myers
CM Deane
D Barker
D Eisenberg
D Hanisch
D Hwang
DJ Reiss
DP Hill
DP Tan
DR Rhodes
DS Goldberg
E Nabieva
E Segal
E Sprinzak
E Wingender
EM Marcotte
F Cozman
F Sohler
FM Rijli
GD Bader
GD Bader
GR Lanckriet
H Hishigaki
H Ogata
H Suzuki
H Tipney
Hannah Tipney
HJ Drabkin
HY Chuang
I Iossifov
I Lee
I Xenarios
J Chen
J Cui
J Graw
J Kim
J Kim
J Li
J Sun
JP Vert
JR Barrow
JS Bader
JT Eppig
L Hedges
L Hunter
L Hunter
L Li
L Salwinski
Lawrence Hunter
M Ashburner
M Bada
M Donalies
M Downes
M Downes
M Gendron-Maguire
M Kanai-Azuma
M Kanehisa
M Krallinger
M Maconochie
MC Mikl
MP Smidt
MS Scott
MY Galperin
N Daraselia
N Nariai
OG Troyanskaya
P Dupont
P Hunt
P Lipton
P Pei
P Saraiya
P Shannon
PA Gray
PM Bowers
Priyanka Kasliwal
PW Lord
R Bellazzi
R Hoffman
R Jansen
R Saito
Richard A. Spritz
Ronald P. Schuyler
S Asthana
S Brewer
S Draghici
S Imoto
S Kerrien
S Leach
S Leach
Satoru Miyano
Sonia M. Leach
T Ideker
T Matsumoto
T Schlitt
Trevor Williams
V Ferretti
W Feng
W Feng
WA Baumgartner
WA Baumgartner Jr
Weiguo Feng
William A. Baumgartner
X Yang
Y Chen
Y Kamei
Y Nakayama
Y Yamanishi
Y Yamanishi
Publication venue: Public Library of Science
Publication date: 01/03/2009
Field of study

The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

CoPub: a literature-based keyword enrichment tool for microarray data analysis.

Author: Alako
B. Heupers
Blaschke
Chagoyen
Chaussabel
Chung
Dahlquist
Frijters
J. de Vlieg
J. Polman
Jelier
Jenssen
Kuffner
M. Bouwhuis
Mlecnik
Nakao
P. van Beek
R. Frijters
R. van Schaik
Rubinstein
W. Alkema
Wu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2008
Field of study

Contains fulltext : 69241.pdf (publisher's version ) (Open Access)Medline is a rich information source, from which links between genes and keywords describing biological processes, pathways, drugs, pathologies and diseases can be extracted. We developed a publicly available tool called CoPub that uses the information in the Medline database for the biological interpretation of microarray data. CoPub allows batch input of multiple human, mouse or rat genes and produces lists of keywords from several biomedical thesauri that are significantly correlated with the set of input genes. These lists link to Medline abstracts in which the co-occurring input genes and correlated keywords are highlighted. Furthermore, CoPub can graphically visualize differentially expressed genes and over-represented keywords in a network, providing detailed insight in the relationships between genes and keywords, and revealing the most influential genes as highly connected hubs. CoPub is freely accessible at http://services.nbic.nl/cgi-bin/copub/CoPub.pl

Crossref

Radboud Repository

Automated cognome construction and semi-automated hypothesis generation

Author: Akil
Alako
Begg
Bilder
Björk
Bohland
Borsboom
Bowden
Bowden
Bradley Voytek
Dirnagl
Editors
Evans
Ioannidis
Jessica B. Voytek
Larson
Lein
Michel
Modha
Parsons
Poldrack
Poldrack
Schmidt
Sporns
Stephan
Stern
Voytek
Voytek
Wren
Yarkoni
Yarkoni
Zhang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref